there are 2 types of species so those are actually variables
variables should not be used as column headers
Same Data, Now Tidy
Plot
Species
Weight
1
A
3.5
1
B
1.2
2
A
2.8
2
B
4.2
each row is an observation
queries are easier
Other Qualities of Tidy Data
Units not included in cell with data
Visual indicators (colors, fonts, italics) not used
Consistent names
Consistent date formats
Short, descriptive language (avoid abstract codes)
Use consistent value for missing data (NaN, -9999, blank OK for pandas)
Data uniquely assigned to a single table
Saved as plain text format (CSV)
Tidy Data Exercise
Do the exercise on [Improving Messy Data] DECIDE WHICH (https://docs.google.com/spreadsheets/d/13YnHHDjG_xEaoHJqvVI2U8cJeNgeOyAJ/edit?usp=drive_link&ouid=110380986407305818312&rtpof=true&sd=true) or
How to Discover Data
Scientific Data Discovery
Streaming Video
Informally between researchers
your mom’s emails
Via project or institutional website
a link at nbc.com
Referenced in a journal article
via a blog review
Discoverable within specialized archive, or repository
AppleTV or Netflix
Discoverable in network of repositories (Data.gov, DataONE)
IMDB
LTER Data Management Requirements
Sites must have an integrated Information Management System
Data available online within two years of collection